10 research outputs found

    Innovative Hybridisation of Genetic Algorithms and Neural Networks in Detecting Marker Genes for Leukaemia Cancer

    Get PDF
    Methods for extracting marker genes that trigger the growth of cancerous cells from a high level of complexity microarrays are of much interest from the computing community. Through the identified genes, the pathology of cancerous cells can be revealed and early precaution can be taken to prevent further proliferation of cancerous cells. In this paper, we propose an innovative hybridised gene identification framework based on genetic algorithms and neural networks to identify marker genes for leukaemia disease. Our approach confirms that high classification accuracy does not ensure the optimal set of genes have been identified and our model delivers a more promising set of genes even with a lower classification accurac

    AI Solutions for MDS: Artificial Intelligence Techniques for Misuse Detection and Localisation in Telecommunication Environments

    Get PDF
    This report considers the application of Articial Intelligence (AI) techniques to the problem of misuse detection and misuse localisation within telecommunications environments. A broad survey of techniques is provided, that covers inter alia rule based systems, model-based systems, case based reasoning, pattern matching, clustering and feature extraction, articial neural networks, genetic algorithms, arti cial immune systems, agent based systems, data mining and a variety of hybrid approaches. The report then considers the central issue of event correlation, that is at the heart of many misuse detection and localisation systems. The notion of being able to infer misuse by the correlation of individual temporally distributed events within a multiple data stream environment is explored, and a range of techniques, covering model based approaches, `programmed' AI and machine learning paradigms. It is found that, in general, correlation is best achieved via rule based approaches, but that these suffer from a number of drawbacks, such as the difculty of developing and maintaining an appropriate knowledge base, and the lack of ability to generalise from known misuses to new unseen misuses. Two distinct approaches are evident. One attempts to encode knowledge of known misuses, typically within rules, and use this to screen events. This approach cannot generally detect misuses for which it has not been programmed, i.e. it is prone to issuing false negatives. The other attempts to `learn' the features of event patterns that constitute normal behaviour, and, by observing patterns that do not match expected behaviour, detect when a misuse has occurred. This approach is prone to issuing false positives, i.e. inferring misuse from innocent patterns of behaviour that the system was not trained to recognise. Contemporary approaches are seen to favour hybridisation, often combining detection or localisation mechanisms for both abnormal and normal behaviour, the former to capture known cases of misuse, the latter to capture unknown cases. In some systems, these mechanisms even work together to update each other to increase detection rates and lower false positive rates. It is concluded that hybridisation offers the most promising future direction, but that a rule or state based component is likely to remain, being the most natural approach to the correlation of complex events. The challenge, then, is to mitigate the weaknesses of canonical programmed systems such that learning, generalisation and adaptation are more readily facilitated

    Optimal utilization of historical data sets for the construction of software cost prediction models

    Get PDF
    The accurate prediction of software development cost at early stage of development life-cycle may have a vital economic impact and provide fundamental information for management decision making. However, it is not well understood in practice how to optimally utilize historical software project data for the construction of cost predictions. This is because the analysis of historical data sets for software cost estimation leads to many practical difficulties. In addition, there has been little research done to prove the benefits. To overcome these limitations, this research proposes a preliminary data analysis framework, which is an extension of Maxwell's study. The proposed framework is based on a set of statistical analysis methods such as correlation analysis, stepwise ANOVA, univariate analysis, etc. and provides a formal basis for the erection of cost prediction models from his¬torical data sets. The proposed framework is empirically evaluated against commonly used prediction methods, namely Ordinary Least-Square Regression (OLS), Robust Regression (RR), Classification and Regression Trees (CART), K-Nearest Neighbour (KNN), and is also applied to both heterogeneous and homogeneous data sets. Formal statistical significance testing was performed for the comparisons. The results from the comparative evaluation suggest that the proposed preliminary data analysis framework is capable to construct more accurate prediction models for all selected prediction techniques. The framework processed predictor variables are statistic significant, at 95% confidence level for both parametric techniques (OLS and RR) and one non-parametric technique (CART). Both the heterogeneous data set and homogenous data set benefit from the application of the proposed framework for improving project effort prediction accuracy. The homogeneous data set is more effective after being processed by the framework. Overall, the evaluation results demonstrate that the proposed framework has an excellent applicability. Further research could focus on two main purposes: First, improve the applicability by integrating missing data techniques such as listwise deletion (LD), mean imputation (MI), etc., for handling missing values in historical data sets. Second, apply benchmarking to enable comparisons, i.e. allowing companies to compare themselves with respect to their productivity or quality.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Vector representations of structured data

    No full text
    The connectionist approach to creating vector representations (VREPs) of structured data is usually implemented by artificial neural network (ANN) architectures. ANNs are trained on a representative corpus and can then demonstrate some degree of generalization to novel data. In this context, structured data are typically trees, the leaf nodes of which are assigned some n-element (often binary) vector representation. The strategy used to encode the leaf data and the width of the consequent vectors can have an impact on the encoding performance of the ANN architecture. In this thesis the architecture of principle interest is called simplified recursive auto associative memory, (S)RAAM, which was devised to provide a theoretical model for abother architecture called recursive auto associative memory, RAAM. Research continues in RAAMs in terms of improving their learning ability, understanding the features that are encoded and improving generalization. (S)RAAM is a mathematical model that lends itself more readily to addressing these issues. Usually ANNs designed to encode structured data will, as a result of training, simultaneously create an encoder function to transform the data into vectors and a decoder function to perform the reverse transformation. (S)RAAM is a mathematical model that lends itself more readily to addressing these issues. Usually ANNs designed to encode structured data will, as a result of training, simultaneously create an encoder function to transform the data into vectors and a decoder function to perform the reverse transformation. (S)RAAM as a model of this process was designed to follow this paradigm. It is shown that this is not strictly necessary and that encoder and decoder functions can be created at separate times, their connection being maintained by the data unpon which they operate. This leads to a new, more versatile model called, in this thesis, the General Encoder Decoder, GED. The GED, like (S)RAAM, is implemented as an algorithm rather than a neural network architecture. The thesis contends that the broad scope of the GED model makes it a versatile experimental vehicle supporting research into key properties of VREPs. In particular these properties include the strategy used to encode the leaf tokens within tree structures and the features of these structures that are preferentially encode

    Mintram,(2006): “Using industry based data sets in software engineering research

    No full text
    ABSTRACT This paper describes the use of software project development data obtained from industry based projects. It argues the importance of carrying out a preliminary data analysis procedure for software development cost estimation. The paper also presents the limitations of using these industrial data (ISBSG R9 and Bank63 Data) based on the above research. Current state of the research and further work is discussed

    Targeted projection pursuit for visualizing gene expression data classifications

    No full text
    We present a novel method for finding low-dimensional views of high-dimensional data: Targeted Projection Pursuit. The method proceeds by finding projections of the data that best approximate a target view. Two versions of the method are introduced; one version based on Procrustes analysis and one based on an artificial neural network. These versions are capable of finding orthogonal or non-orthogonal projections, respectively. The method is quantitatively and qualitatively compared with other dimension reduction techniques. It is shown to find 2D views that display the classification of cancers from gene expression data with a visual separation equal to, or better than, existing dimension reduction techniques

    Spatial solitons in X(2) planar photonic crystals

    No full text
    We analyze light self-confinement induced by multiple nonlinear resonances in a two-dimensional X(2) photonic crystal. With reference to second-harmonic generation in a hexagonal lattice, we show that the system can not only support two-color (1+1)D solitary waves with enhanced confinement and steering capabilities but also enable novel features such as wavelength-dependent soliton routing
    corecore